Skip to main content

All Questions

Tagged with
0 votes
5 answers
158 views

How to bring the next line to the end of the first line using awk, separated by a comma?

I have downloaded some sequences from a publicly available database in .fa format. I want to generate a .csv file that contains the name of the sequence, and its length, separated by a comma. This is ...
Hrishikesh Hardikar's user avatar
-1 votes
4 answers
2k views

Grep for a range of numbers

I have a .txt file with multiple lines that gives amino acid and residue data. The data looks like this: ARG262-Side ASP368-Side 140,83% ARG95-Side GLU107-Side 103,73% ARG474-Side VAL468-Main 94,93% ...
Derman Basturk's user avatar
1 vote
3 answers
94 views

is there is a way to extract lines in between 3 files that are in common based on one column?

I have 3 files, space separated which have about 3.4 million lines (but they don't have exactly the same number of lines and they are sorted by the "Marker" column). They look like this: head neu1 ...
anamaria's user avatar
  • 121
-1 votes
5 answers
3k views

How to filter column in TSV file with bilions rows

I am working with a list with billions rows of data. I have data like this: Like you see, in fourth column (gene column) there exist names of genes but not all rows have a "gene name". I need to get ...
Lulu' Nisrina's user avatar
0 votes
7 answers
1k views

Simplest way to extract a portion of a string?

I've got a file (bigfile.txt), one of the columns looks like this NW_017095471.1 Gnomon mRNA 108321 109565 . + . ID=rna34;Parent=gene27;Dbxref=GeneID:108565285,Genbank:XM_017925071.1;...
R-MASHup's user avatar
  • 115
8 votes
3 answers
15k views

extract lines that match a list of words in another file

I have file 1 which have those lines: ATM 1434.972183 BMPR2 10762.78192 BMPR2 10762.78192 BMPR2 1469.14535 BMPR2 1469.14535 BMPR2 1738.479639 BMS1 4907.841667 BMS1 4907.841667 BMS1 880.4532628 BMS1 ...
LamaMo's user avatar
  • 223
2 votes
3 answers
6k views

Print text before and after match, from a specific beginning and to an ending string

I am trying to extract entries from within a large Genbank file, with many thousands of entries. For a search string, I'm using a unique gene name – that works fine. The tricky bit is that I'd like to ...
anth's user avatar
  • 21
3 votes
3 answers
780 views

Comparing two files linewise and if pattern of file 1 is not found (fully or partially) in file 2 then print line of file 1

I have two files: file1 (search): 1 GACGGAGGATGCAAGTGTTATCCGGAATCACTGGGCGTAAAGTTTTTTTTT 2 GACGGAGGATGCAAGTGTTATCCGGAAT 3 GACGGAGGATGCAAGTGTTATCCGGAATCACTGGGCGTAAAGCGTCC 4 ...
MSt's user avatar
  • 31
0 votes
2 answers
97 views

Extract value from formatted text with bash

I've got a .fasta file, which is strictly a formatted text containing some informations about DNA. Here's its common structure: >NODE_18_length_75451_cov_83.3021 ...
Shred's user avatar
  • 133
0 votes
3 answers
700 views

extract block of text from another file

i want to extract block of text based on ids present in another file, Input >Feature scaffold1 1 100 g 101 200 g 201 300 g 500 500 r 900 1000 r >Feature scaffold2 1 100 g 01 500 g 200 ...
Namrata Patel's user avatar
0 votes
2 answers
698 views

Find and replace lines in text file with output from another file

I have two files A and B. File A >Node1 ... >Node2 ... File B >gb|KY551314.1| Influenza A virus (A/mallard/Idaho/AH0011522/2015(H7N7)) segment 2 polymerase PB1 (PB1) and ...
ChrisD's user avatar
  • 103
1 vote
2 answers
174 views

Deconstructing one line into two lines based on specific columns

I have a .tsv file (batch_1.catalog.tags.tsv) consisting of 1,965,056 lines of 14 columns. I want to break some of these into two lines. The first line: starts with a greater than sign (>) ...
Age87's user avatar
  • 559
2 votes
0 answers
395 views

use gff2fasta instead of a bash script to get parts of DNA sequences out of a full genome

EDIT and a solution Because my original question was badly phrased and I was trying to re-invent the wheel I am answering my own question now (maybe it helps someone else): gff2fasta is a tool which ...
gugy's user avatar
  • 163